Automatic Discovery Of Term Similarities Using Pattern Mining

نویسندگان

  • Goran Nenadić
  • Spasicacute
  • Irena
  • Sophia Ananiadou
چکیده

Term recognition and clustering are key topics in automatic knowledge acquisition and text mining. In this paper we present a novel approach to the automatic discovery of term similarities, which serves as a basis for both classification and clustering of domain-specific concepts represented by terms. The method is based on automatic extraction of significant patterns in which terms tend to appear. The approach is domain independent: it needs no manual description of domain-specific features and it is based on knowledge-poor processing of specific term features. However, automatically collected patterns are domain specific and identify significant contexts in which terms are used. Beside features that represent contextual patterns, we use lexical and functional similarities between terms to define a combined similarity measure. The approach has been tested and evaluated in the domain of molecular biology, and preliminary results are presented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Discovery of Technology Networks for Industrial-Scale R&D IT Projects via Data Mining

Industrial-Scale R&D IT Projects depend on many sub-technologies which need to be understood and have their risks analysed before the project can begin for their success. When planning such an industrial-scale project, the list of technologies and the associations of these technologies with each other is often complex and form a network. Discovery of this network of technologies is time consumi...

متن کامل

Mining Term Similarities from Corpora*

In this article we present an approach to the automatic discovery of term similarities, which may serve as a basis for a number of term-oriented knowledge mining tasks. The method for term comparison combines internal (lexical similarity) and two types of external criteria (syntactic and contextual similarities). Lexical similarity is based on sharing lexical constituents (i.e. term heads and m...

متن کامل

Temporal Databases and Frequent Pattern Mining Techniques

Data mining is the process of exploring and analyzing data from different perspective, using automatic or semiautomatic techniques to extract knowledge or useful information and discover correlations or meaningful patterns and rules from large databases. One of the most vital characteristic missed by the traditional data mining systems is their capability to record and process time-varying aspe...

متن کامل

Discovery of Frequent Patterns from Web Log Data by using FP-Growth algorithm for Web Usage Mining

Web usage mining refers to the automatic discovery and analysis of patterns in click stream and associated data collected or generated as a result of user interactions with web resources on one or more web sites. It consists of three phases which are data Preprocessing, pattern discovery and pattern analysis. In the pattern discovery phase, frequent pattern discovery algorithms applied on raw d...

متن کامل

Tactical Analysis Modeling through Data Mining - Pattern Discovery in Racket Sports

We explore pattern discovery within the game of tennis. To this end, we formalize events in a match, and define similarities for events and event sequences. We then proceed by looking at unbalancing events and their immediate prequel (using pattern masks) and sequel (using nondeterministic finite automata). Structured in this way, the data can be effectively mined, and a similar approach might ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002